smooth wasserstein distance
Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
Minimum distance estimation (MDE) gained recent attention as a formulation of (implicit) generative modeling. It considers minimizing, over model parameters, a statistical distance between the empirical data distribution and the model. This formulation lends itself well to theoretical analysis, but typical results are hindered by the curse of dimensionality. To overcome this and devise a scalable finite-sample statistical MDE theory, we adopt the framework of smooth 1-Wasserstein distance (SWD) $\mathsf{W}_1^{(\sigma)}$. The SWD was recently shown to preserve the metric and topological structure of classic Wasserstein distances, while enjoying dimension-free empirical convergence rates. In this work, we conduct a thorough statistical study of the minimum smooth Wasserstein estimators (MSWEs), first proving the estimator's measurability and asymptotic consistency. We then characterize the limit distribution of the optimal model parameters and their associated minimal SWD. These results imply an $O(n^{-1/2})$ generalization bound for generative modeling based on MSWE, which holds in arbitrary dimension. Our main technical tool is a novel high-dimensional limit distribution result for empirical $\mathsf{W}_1^{(\sigma)}$. The characterization of a nondegenerate limit stands in sharp contrast with the classic empirical 1-Wasserstein distance, for which a similar result is known only in the one-dimensional case. The validity of our theory is supported by empirical results, posing the SWD as a potent tool for learning and inference in high dimensions.
Review for NeurIPS paper: Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
Additional Feedback: The list of remarks and questions I have: * The current standard for regularization of OT is entropic regularization of the plan (papers of Cuturi [5], and also sample complexity results [3,4]). This paper seems to mostly ignore this literature, which is quite weird, given the fact that the goals are (almost) the same. Given the fact that entropic regularization can (should) be viewed as a "cheap proxy" for Gaussian smoothing, a proper and detailed comparison seems in order. The authors seem to be using a re-sampling scheme "Sampling from P_n and N(sigma)_x0000_ and adding the obtained values produces samples from P_n * N(sigma)". But the potential problem is that to cope with the CoD, this might requires a number of samples exponential in the dimension.
Review for NeurIPS paper: Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
The reviewers agree that this is a good paper that deserves acceptance. The contributions are useful from a statistical point of view. They also agree that the computational limitations should be put more upfront: the idea of Gaussian smoothing has a limited interest for the neurips community unless one has an efficient algorithm to solve optimal transport between the smoothed densities, which is not the case yet (any method based purely on discretizations, as proposed here, inevitably suffers from the curse of dimensionality). The authors mention in the rebuttal that an idea is to parameterize the dual variable with a neural network, but this leads to an object that is very different from SWD since neural networks have inductive biases. For these reasons, I recommend accept (poster).
Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
Minimum distance estimation (MDE) gained recent attention as a formulation of (implicit) generative modeling. It considers minimizing, over model parameters, a statistical distance between the empirical data distribution and the model. This formulation lends itself well to theoretical analysis, but typical results are hindered by the curse of dimensionality. To overcome this and devise a scalable finite-sample statistical MDE theory, we adopt the framework of smooth 1-Wasserstein distance (SWD) \mathsf{W}_1 {(\sigma)} . The SWD was recently shown to preserve the metric and topological structure of classic Wasserstein distances, while enjoying dimension-free empirical convergence rates.